A new constraint for mining sets in sequences1
نویسندگان
چکیده
Discovering interesting episodes is a popular area in temporal or sequential data mining, examples of which are mining text or protein sequences. In such data, the order in which the events appear is being analysed and the user’s goal is to identify the regularities that may appear in the dataset, consisting of one or more sequences. The usual approach to episode discovery is to look for episodes consisting of events that frequently appear close to each other. Most of the current state-of-the-art methods first use a window of fixed length to find sufficiently cohesive episodes and then retrieve those that occur in more windows (or sequences) than a given minimum threshold. The frequency of an itemset X, fr(X), is thus defined as the number of windows X appears in divided by the total number of possible windows. The use of a window of fixed length is a major limitation of such approaches as no episodes longer than this window can ever be discovered. A different method that increases the window length proportionally to the size of the candidate set has been proposed in order to remove this limitation. Still, in this proposal, the window length remains fixed for a particular candidate when counting its frequency in the sequence. Hence, when the episode occurs in the sequence, but in a time frame larger than the window size, then such occurrences will be disregarded. The high frequency of a set of events appearing close together gives no guarantee that a subset of that set will not sometimes appear far away from the rest of the set. Take, for example, the following sequence:
منابع مشابه
A new method for 3-D magnetic data inversion with physical bound
Inversion of magnetic data is an important step towards interpretation of the practical data. Smooth inversion is a common technique for the inversion of data. Physical bound constraint can improve the solution to the magnetic inverse problem. However, how to introduce the bound constraint into the inversion procedure is important. Imposing bound constraint makes the magnetic data inversion a n...
متن کاملConvex Generalized Semi-Infinite Programming Problems with Constraint Sets: Necessary Conditions
We consider generalized semi-infinite programming problems in which the index set of the inequality constraints depends on the decision vector and all emerging functions are assumed to be convex. Considering a lower level constraint qualification, we derive a formula for estimating the subdifferential of the value function. Finally, we establish the Fritz-John necessary optimality con...
متن کاملConstraint-Based Pattern Set Mining
Local pattern mining algorithms generate sets of patterns, which are typically not directly useful and have to be further processed before actual application or interpretation. Rather than investigating each pattern individually at the local level, we propose to mine for global models directly. A global model is essentially a pattern set that is interpreted as a disjunction of these patterns. I...
متن کاملConstraint-Based Mining of Formal Concepts in Transactional Data
We are designing new data mining techniques on boolean contexts to identify a priori interesting concepts, i.e., closed sets of objects (or transactions) and associated closed sets of attributes (or items). We propose a new algorithm D-Miner for mining concepts under constraints. We provide an experimental comparison with previous algorithms and an application to an original microarray dataset ...
متن کاملHigh Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کامل